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Sorting and grading are qualitative operational tasks performed in food 
processing industries. Realization of higher accuracy in mass estimation is 
the key inclination of this work. In this work, an automated technique for 
mass estimation of citrus limetta is devised based on the geometrical features 
derived from pre-processed images. Dataset includes 250 data samples of 
citrus limetta, whose images are acquired in different orientations. Two 
novel handcrafted distance-based geometrical features along with four 
conventional geometrical features were employed for regression analysis. 
Predictive modeling is conducted with configuration of 150 training and 100 
testing data samples and subject to regression analysis for mass estimation. 
Multiple linear and support vector regression models with linear, polynomial 
and radial basis function (RBF) kernels were employed for mass estimation 
with two different model configurations, conventional and conventional 
with handcrafted features, for which an R2 score of 0.9815, root mean 
squared error (RMSE) of 10.94 grams, relative averages of accuracy and 
error of 96.61% and 3.39% respectively is achieved for the proposed model 
and configuration which was validated using k-fold cross-validation. 
Through comparison with performance of model with conventional and 
conventional with handcrafted features configurations, it was established 
that inclusion of handcrafted features was able to increase the performance 
of the models. 


This is an open access article under the CC BY-SA license. 


Corresponding Author: 
Shobha Rani Narayana Murthy 


Department of Computer Science, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham 
Mysuru Campus, Amrita Vishwa Vidyapeetham, Mysuru, Karnataka, India 


Email: n_shobharani@ my.amrita.edu 


1. INTRODUCTION 


Citrus limetta is commonly known as Mosambi or Musambi in India, it has an excellent market and 
is the 3 largest fruit cultivated in India. It is cultivated in dry climatic condition and needs 60-75 cm of 
rainfall annually. Citrus limetta and similar citrus fruits provide excellent and unique health benefits from 
their consumption. Among the various post-harvest operations, grading and sorting are the most time 
intensive that demand automated processing techniques, one of the crucial parameters that are essential in 
grading and sorting of fruits are size and weight. Non-destructive techniques for grading and sorting which 
are contactless are highly beneficial in food processing industries, as they enable high-speed processing and 
fit the demands of industries. Computer vision and machine learning techniques assist in the accurate 
determination of shape and mass that decides the fruit grading. Integration of image processing techniques in 
fruit processing ensures the reliability of vision-based grading and sorting systems. Mass estimation of fruit 
supports in deciding the fruit maturity, and in turn, helps in fruit grading. Robust estimation of mass is highly 
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essential for real time fruit processing, therefore continuous investigation efforts are underway to enhance the 
overall precision in mass estimation. 

Survey on significant contributions on mass estimation of variety of fruits using image processing, 
and machine learning techniques are discussed as follows: mass and volume estimation of cherry tomatoes 
were investigated [1]. The study is focused on analyzing relationship between tomato mass and volume is 
analyzed using 2D and 3D image features. Support vector regression model (SVRM) with radial basis 
function (RBF) kernel is used for the estimation of volume and mass. In a work by Ganjloo et al. [2], image 
processing techniques are used for mass estimation and shape through geometrical features and decision tree- 
based analysis. An investigation based on image processing and artificial intelligence models is proposed by 
Lee et al. [3] for mass estimation of fruits, bagged ensemble tree regressors were adopted for prediction via 
correlation study of fruit image. Regression-based models are developed by Okinda et al. [4] for the 
prediction of egg volume, and shape-based information extracted are employed as Regression model inputs. 
A detailed study on various computer vision systems is conducted by Nyalala et al. [5], regarding weight and 
volume estimation of poultry-based products. Mass estimation of symmetrically shaped fruits is, researched 
by Yani et al. [6] using computer vision techniques. Assessment of fruit quality through automatic mass 
prediction is suggested by Gokul et al. [7] using image processing techniques. 

In other works, investigation on models for mass estimation and volumes of lime fruits are 
investigated by Jayarmi and Taghizadeh [8] using non-destructive techniques viz. physical attributes of 
objects. Classification of fruits based on their maturity and mass indicator is introduced by Iqbal et al. [9] 
using color based features discriminant analysis is carried out for feature selection and classification. 
Artificial adaptive neuro-fuzzy inference models based mass estimation of Sweet lime are devised [10]. The 
goodness of fit is employed to check the proficiency of the models proposed. In a different work [11], 
machine learning techniques and neuro-fuzzy inference systems are extended for the prediction of lime fruits. 
A self-built database is used for validation of the weight estimated. The fruit and vegetable mass estimation 
of irregularly shaped objects is investigated [12]. Volume estimation of strawberries, mushrooms and 
tomatoes using geometrical measurements by Concha-Meyer et al. [13] using correlation-based analysis and 
regression analysis is applied between the weight and volume measurements of datasets. Mass modeling of 
Sohiong fruit is investigated by Vivek et al. [14] physical and mechanical properties are used to predict the 
mass. In the literature, various models are reported by [15]—[17] focused mainly on modeling of volume and 
mass of a variety of fruits and objects. Some of the state of art papers that are reported in the recent years are 
reviewed and the highlights of the same are presented for comprehensibility in the Table 1. 


Table 1. Key concepts used in literature automated mass estimation 


Samples Features Dataset Methods Models Reference 
Olive Major axis, Minor axis, 3,600 Image processing, Linear regression model Ponce et al. 
fruit And area samples manual [18] 
measurements 
Pistachio Length and area 2,000 Image processing, Random-forest (RF) model Vidyarthi 
kernel samples manual et al. [19] 
measurements 
Mango Thickness and Diameters 61 Image processing Artificial neural network model Utai et al. 
samples (ANN) [20] 
Orange Area, eccentricity, perimeter, 300 Image processing Adaptive neuro fuzzy inference Javadikia 
length, width, area, color, samples system-fuzzy sugeno model et al. [21] 
contrast, texture, and 
roughness. 
Yellow Area and diameters 135 Image processing, Linear regression model Calixto et al. 
melon samples manual [22] 
measurements 
Citrus Area 5,000 Image processing Naive bayes classifier and ANN Shin et al. 
fruit samples model [23] 
Fishes Area 2,500 Image processing Convolutional neural network model Konovalov 
samples et al. [24] 
Almond Length, breadth and area 1,000 Image processing, Stacked ensemble model (SEM) with Vidyarthi 
samples manual ANN, RF, support vector regression et al. [25] 
measurements (SVR), k-nearest neighbor and kernel 


ridge regression 


From various literatures we observe that, they extensively employed predefined geometrical features 
for measuring the features of fruits and objects and observed that a set of predefined features used identified 
through assumption-based procedures and adopted for prediction of mass using machine learning models. It 
is also inferred from a couple of works that the issue of generalization is noticed with regard to a generic 
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algorithmic procedure designed for mass estimation of multiple fruit types. However, as food processing is 
more specific to a particular fruit type and hence having a generic approach may simply increase 
computational complexity and loss of accuracy. The literatures were aimed at the development of heuristic or 
handcrafted feature models which may enhance the scope of achieving high recognition accuracy with 
optimal number of training samples. 


2. METHOD 
2.1. Experimental setup and dataset collection 

Experiments were conducted between October 2020 and May 2021 at Amrita Vishwa 
Vidyapeetham, Mysuru Campus, India. Citrus limetta were collected from various local markets of the 
Mysuru district. Fruits selected for dataset creation are mostly non-defective, ranging from various sizes and 
ripeness. Each Citrus limetta is labeled, weighed by a calibrated electronic scale with an accuracy of + 0.01 g 
and ground truth data is tabulated. 


2.2. Image acquisition setup 

Images were acquired by equipping an imaging system with a Logitech HD270 webcam mounted on 
a tripod camera stand 0.5 meters perpendicularly above a plain uniform background. The camera is 
connected to a Windows 10 laptop via a USB port. Inbuilt camera software provided by Microsoft was used 
to acquire the images at the resolution of 1280x720 pixels. Each Citrus limetta was imaged in four different 
orientations, capturing the variation in the geometrical measurements with orientation and position. 
In Figure 1 of image acquisition setup, Figure 1(a) shows the experimental setup used for image acquisition 
and Figure 1(b) shows the sample images acquired covering various orientations. 


Logitech C270HD Webcam 


0.5 meters 


Uniform Platform Computer 


(a) 


Figure 1. Image acquisition setup (a) image acquisition setup and (b) sample images acquired in different 
orientations 


2.3. Pre-processing 

Pre-processing in the proposed method is only a basic clean-up procedure carried out for 
smoothening out the image so that object extraction can be accurate. The raw image was initially subject to 
grayscale image conversion. In order to erode the irrelevant border components, the grayscale image is 
subject to Gaussian smoothing. The aim of smoothening is to suppress irrelevant distortions, so that features 
of objects will be enhanced for subsequent processing. In order to extract the region of interest Canny edge 
detection operator is consulted as there was significant discrimination between the foreground and the 
background. Finally, small object removal along with the morphological operation, dilation was performed to 
remove noisy pixels and maximizes the edge strength of boundary of the object and filling of small openings 
on the boundary. Figure 2 shows the outcomes of the image pre-processing with edge detection in Figure 2(a) 
and after small object removal in Figure 2(b). 


2.4. Feature extraction 

In the proposed method, two novel distance-based handcrafted features devised and various 
parameters that are consulted from the region of interest are indicated in Figure 3. A total of six different 2D 
features of the citrus limetta were identified, two of which were handcrafted features. The features are 
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projected area (Ap), boundary length (B), major-axis length (L,), Minor-axis length (L2), the two handcrafted 
features top parallel chord length (A,) and bottom parallel chord length (A2). 


(a) (b) 


Figure 2. Pre-processing (a) highlighted noisy pixels after canny edge detection and (b) outcome of small 
object removal along with morphological dilation 
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Figure 3. Geometrical feature extraction models 


Ay, is the projected area of Citrus limetta which the object takes up when projected onto the image 
plane or the total number of pixels inside the boundary of the object. B is the boundary length of the object 
computed by counting the number of pixels in a closed contour points set. The distance parameter L; as the 
distance between the major-axis endpoints P and Q designated by (P},P2) and (Q1, Q2) respectively, 
expressed using Euclidean distance function given by the (1). 


Lı = JQ, — P,)? + (Q3 — P3)? (1) 


Similarly, distance parameter L, is the distance between the minor-axis endpoints R and S designated by 
(R,, R2) and (S1, S2) respectively, expressed using Euclidean distance metric given by the (2). 


Ly = y (Q1 — P1)? + (Q2 — P2)? (2) 


2.4.1. Handcrafted features 

Along with conventional geometrical measurements that are employed for feature extraction of 
Citrus limetta, two novel distance-based parameters such as top parallel chord length (41) and bottom parallel 
chord length (A) are also extracted and are defined as the Euclidean distance-based measurements of the 
chords that are parallel to the major-axis and present at a distance of three-fourth of semi minor-axis length 
(3 +4 x L +2) away from the center of the ellipse (xo, Yo), assuming the major-axis is in the horizontal 
orientation. A, is the Euclidean distance length between the endpoints of the top parallel chord viz. A and B 
designated by (A,, A2) and (B,, Bz) respectively, is given by the (3) as specified in Figure 3. 
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Ay = y (Bi — 41)? + (B2 — Az)? (3) 


Where A, is the Euclidean distance length between the endpoints of the bottom parallel chord viz. C and D 
designated by (C4, C2) and (D,, Dz) respectively, is given by the (4) as specified in Figure 3. 


A2 = y (Dy = C1)? + (D2 — C2)? (4) 


2.5. Statistical correlation test and feature analysis 

Pearson’s statistical correlation test was performed using RStudio (Version 1.4.1106) to determine 
the linear dependence between the dependent feature (M) with respect to the independent features extracted 
(Ap, B, Ly, L2, Ay, a2). According to the results of Pearson’s statistical correlation test, the correlation 
coefficients are close to unity, indicating that there is a strong correlation between the independent features 
(A,B, Lı Lz, Aiea) and the dependent feature (M). The two handcrafted had correlation co-efficient equal 
to 0.95 and other conventional features which had correlation co-efficient equal to 0.98 or 0.99. Based on the 
values of regression analysis performed using Im (linear-model) function in R-Studio, it was observed that 
the p-values for individual independent features to predict the dependent feature were less than 0.01. 
Therefore, the six extracted features were considered for the proposed model configuration as they were 
found statistically significant and viable for estimating the weight. To observe the impact of the handcrafted 
features on the performance of the model, the conventional features configuration considers projected area 
(Ap), boundary length (B), major-axis length (L1), minor-axis length (L2) as independent features for the 
predication of weight. 


2.6. Prediction modeling 

In order to estimate the mass of the citrus limetta based on the extracted features, two regression 
models were explored namely, multiple linear regression and support vector regression with linear, 
polynomial, and RBF kernels. The models were implemented using Python programming language 
(Python 3.7.5) using the scikit-learn (version 1.0.1) library. A k-fold cross-validation (k=10) method was 
used to validate the models. The k-fold cross-validation method divides the randomized dataset to specified 
k-folds, one fold is used for testing the model and the remaining folds are used for training the model, the 
evaluation score for each possible unique test-train groups generated are averaged as the result. k-fold cross- 
validation is mainly used in machine learning to gauge the capabilities of a model on unseen data. The hyper 
parameters for the SVRM models were found using the grid search method which iteratively searches for the 
optimal parameters from the combinations of set of given parameters using the 10-fold cross-validation and 
the results are shown in Table 2. The dataset consists of 250 samples, which is divided into the training 
dataset, consisting of 150 data samples 60% and a testing dataset, consisting of 100 data samples 40%. The 
dataset was synthetically increased from 200 samples to 250 samples using the reweighting technique. 


Table 2. Support vector regression (SVR) hyperparameters for various kernels 


Models C E Degree 
Linear-SVR 0.001 10 - 
Polynomial-SVR 150 8 1 
RBF-SVR 750 1 - 


3. RESULTS AND DISCUSSION 

The models were evaluated on the testing dataset and the performances of the models with various 
evaluation metrics such as R°, mean-absolute error (MAE), mean squared error (MSE) and root mean 
squared error (RMSE) with two different configurations, the results of which are presented in Table 3. The 
models were validated using the k-fold cross-validation technique (k=10) on the entire dataset, and the results 
of the validation tests as displayed in Table 4. Figure 4 depicts the scatter plots for the relationship between 
the estimated weight and the measured mass in Figure 4(a) using multiple linear regression (MLR) model, 
Figure 4(b) using linear support vector regression (SVR) model, Figure 4(c) using SVR polynomial and 
Figure 4(d) support vector regression (SVR) radial bias function (RBF) models. The results indicate that the 
models were accurate in estimating the weights of the Citrus limetta with minor differences between 
estimated weights and the measured weights. The models performed with close proximity to each other, and 
out of all the models investigated the RBF-SVR model shows the best performance on both configurations 
with an R? value of 0.98151, with an average accuracy of 96.614% and with an average relative error of 
3.386% on the test dataset with proposed configuration. The linear-SVR, polynomial-SVR and multiple linear 
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regression models showed an average relative error of 3.936%, 3.731%, 3.846% and average accuracy of 


96.064%, 96.269%, and 96.154% respectively in the proposed conventional with handcrafted features model 


configuration. 


Table 3. Performance of the models to estimate the mass on test dataset with two configurations 


Configuration Regression model R? MAE (ing) MSE (ing) RMSE (in g) 
Conventional MLR 0.97914 8.05956 135.05266 11.62122 
features Linear-SVR 0.97830 8.21393 140.52383 11.85427 
Polynomial-SVR 0.97922 8.05081 134.56667 11.60029 
RBF-SVR 0.97976 7.86438 125.08693 11.18423 
Conventional MLR 0.97923 8.20018 134.51022 11.59785 
with handcrafted Linear-SVR 0.97831 8.43624 140.46166 11.85165 
features Polynomial-SVR 0.97930 8.02617 134.03429 11.57732 
RBF-SVR 0.98151 7.16062 119.73611 10.9424 


Table 4. 10-fold cross-validation results for the models to estimate the mass with two configurations 


Configuration Regression model R? MAE (ing) MSE (ing) RMSE (in g) 
Conventional features MLR 0.97912 7.67018 122.17714 11.05338 
Linear-SVR 0.97892 7.76509 123.55969 11.11574 
Polynomial-SVR 0.97913 7.64224 121.64801 11.02942 
RBF-SVR 0.97963 6.98280 119.57198 10.93490 
Conventional with MLR 0.97938 7.83741 120.31454 10.96880 
handcrafted features Linear-SVR 0.97941 7.85050 120.17145 10.96227 
Polynomial-SVR 0.97921 7.63920 121.22217 11.01009 
RBF-SVR 0.98006 6.95012 114.59439 10.70488 
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SVR Linear Model - Conventional with Handcrafted Features (Test Dataset) 


3504 A 


3004 we 


200 5 


Predicted Mass 
e 


100 150 200 250 300 350 
Measured Mass 


(b) 


SVR RBF Model - Conventional Features with Handcrafted Features (Test Dataset) 
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Figure 4. Scatter plots of relationship between the estimated and measured mass for regression models in 
proposed configuration (a) MLR model (b) SVR Linear model (c) SVR polynomial model, and 
(d) SVR RBF model 
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SVRM with RBF model similarly performed better than its counterparts in the conventional features 
model configuration. This shows that the RBF-SVR model is a better model for prediction of the mass of 
Citrus limetta using proposed geometrical features out of the models explored this is further validated from 
Table 4, as RBF-SVR model consistently performed best in 10-fold cross validation test. The handcrafted 
features model in comparison with the conventional features model, was able to increase the performance of 
the model in both test dataset and 10-fold cross validation results. The RBF-SVR model with proposed 
conventional with handcrafted features model configuration was able to reduce the root mean square error by 
2.21% in the test dataset compared with conventional features model configuration. 


4. CONCLUSION 

In this work, regression models such as multiple linear regression and support vector regression with 
linear, polynomial and RBF kernels were explored and developed in order to estimate the weight of the 
Citrus limetta using the acquired and pre-processed images of the citrus limetta as input. Basic pre-processing 
techniques were applied for extracting the fruit region and the geometrical features of the fruit were 
calculated based on the fruit region extracted from the pre-processed images. Through statistical correlation 
test it was affirmed that the geometrical features extracted were viable and statistically significant for 
estimating the weight of the citrus limetta. Out of the models explored the support vector regression with the 
RBF kernel performed the best with an R? of 0.98151, an RMSE of 10.9424 grams, an average accuracy of 
96.614% and an average relative error of 3.386% towards the proposed model configuration. The models 
explored were validated using the k-fold cross validation and support vector regression with the RBF kernel 
also performed the best among the explored models in k-fold cross validation test with all model 
configurations. It was observed that developing hand-crafted features that capture more data concerning the 
geometry of the fruit and including them as independent features along with conventional features for the 
model were able to consistently increase the performance of the models. 
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